Suppose you wanted to measure the effectiveness of Metababoost™, a new weight-loss drink that has become very popular recently.
You conduct a large survey tracking survey where you are able to re-interview the same people at various points over 12 months.
You are interested in how BMI changes amongst people who regularly consume Metababoost™, compared against those who do not. You try as best you can to make sure that these two groups are similar in terms of age, gender, initial BMI, etc.
After one year, you find that people who report regularly consuming Metababoost™ weigh, on average, 5kg less than the comparison group. This difference is statistically significant.
| Illustration of potential outcomes for the change in BMI, depending on whether or not an individual consumes Metababoost™ | |||
| No Metababoost™ | Metababoost™ | Difference | |
|---|---|---|---|
| Alex | 31 | 26 | -5 |
| Bonnie | 23 | 23 | 0 |
| Colin | 31 | 21 | -10 |
| Danielle | 29 | 34 | 5 |
| Earl | 39 | 29 | -10 |
| Fiona | 38 | 38 | 0 |
| Gaston | 36 | 21 | -15 |
| Hermine | 37 | 32 | -5 |
| AVERAGE | 33 | 28 | -5 |
To say that Metababoost™ causes people to lose 5kg, we mean that in an imaginary counterfactual world where the people who actually drank Metababoost™ instead did not drink it, they would weigh 5kg more, on average.
Similarly, we could say that in a counterfactual world where the people who actually didn’t drink Metababoost™ had instead consumed it regularly, they would weigh 5kg less, on average.
This is the idea behind causation within the potential outcomes framework.
If we could observe everyone’s potential outcome (as in the above table), then finding evidence of causation is easy!
Of course, we cannot observe these counterfactual worlds, and thus our real-world data look something like this:
| Illustration of observed outcomes for the change in BMI for people who do and don’t drink Metababoost™ | |||
| No Metababoost™ | Metababoost™ | Difference | |
|---|---|---|---|
| Alex | 31 | ? | ? |
| Bonnie | ? | 23 | ? |
| Colin | 31 | ? | ? |
| Danielle | 29 | ? | ? |
| Earl | ? | 29 | ? |
| Fiona | ? | 38 | ? |
| Gaston | ? | 21 | ? |
| Hermine | 37 | ? | ? |
How then we do we estimate a causal effect?
Imagine you had a box with a large number of tickets inside. On each ticket is written a value from 0 to 50. You task is estimate the average value of the tickets in the box. You randomly choose 100 tickets from the box, and the average on these tickets is 35.
What is your best estimate for average value of the tickets in the whole box?
Returning to our working example, imagine you had a population of 1000 people. You randomly assign 500 of them to drink Metababoost™ for a year (and you make sure they actually do it). Let’s call these people the treatment group (T), and let’s call the change in BMI you measure for these people their treatment outcomes.
The other 500 randomly selected people are assigned to control group (C) and you make sure that they do not consume any Metababoost™ during the year. Their change in BMI constitute the control outcomes.
Just as you can use the value on your 100 randomly-drawn to estimate the value of all of the tickets in the box, you can think of the observed treatment outcomes as a random sample of all potential treatment outcomes. Thus, the average of these treatment outcomes forms your estimate of the average of potential treatment outcomes.
Similarly, the average of your control outcomes forms your estimate of the average of potential control outcomes.
Taking the difference between these two estimates yields your average treatment effect (ATE), or the average causal effect of Metababoost™.
NOTE: this only works because you have randomly allocated people into T and C.
Small group discussion: Think now about Enos’ experiment.
Recall that the fundamental problem of causal inference arises because people may self-select into T or C.
To return to our working example, the people who choose to drink Metababoost™ may be different in terms of their potential outcomes from the people who choose not to drink it.
In this case, we can no longer consider the observed treatment/control outcomes as a random sample of all potential treatment/control outcomes. Thus our estimates of the ATE may be biased – that is, if we reran this experiment a large of times, our estimates would tend to be either too large or too small.
Turning away from our Metababoost™ example, what if we were instead interested in the causal effect of (contextual) diversity.
Suppose your friend sees Enos’ results and says:
“Nice experiment, but he’s just documenting a temporary reaction to the unexpected appearance of Latinos in all-white suburbs. Over time, however, people are going to become more comfortable with diversity. For example, cities have historically been magnets for immigration, and the people living there seem to have no problem with diversity.”